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Abstract 

Improving Importance Sampling estimators for rare event probabilities requires sharp approximations of 

conditional densities. This is achieved for events En {u{Xi) + ... + u(X„)) G An where the summands are 
i.i.d. and En is a large or moderate deviation event. The approximation of the conditional density of the vector 
(Xi, with respect to En on long runs, when fe„/n — > 1, is handled. The maximal value of kn compatible 
with a given accuracy is discussed; simulated results are presented, which enlight the gain of the present approach 
over classical IS schemes. Detailed algorithms are proposed. 



1 Introduction and notation 
1.1 Motivation and context 

Importance Sampling procedures aim at reducing the calculation time which is necessary in order to evaluate 
integrals, often in large dimension. We consider the case when the integral to be numerically computed is the 
probability of an event defined by a large number of random components; this case has received quite a lot of 
attention, above all when the event is of small probability, typically of order 10~^ or so, as occurs frequently 
in industrial applications or in communication devices. The present paper proposes estimators for both large and 
moderate deviation probabilities; this latest case is of interest for statistics. The situation which is considered is 
the following. 

The r.v's X, X^s arc i.i.d. with known common density px on R, and u is a real valued measurable function 
defined on R. Define U := u(X) with density pu and 

n 
i=l 

We intend to estimate 

Pn := P (Ui,„ e nA) 

for large but fixed n where 

A:=(a„,oo) (1) 

and a„ is a convergent sequence. The limit of this sequence either equals EJJ or is assumed to be larger than EJJ. 
In the first case it will be assumed that a„ converges slowly in such a way that P (Ui_„ e nA) is not obtainable 
through the central limit theorem; we may call this case a moderate deviation case. The second situation is 
classically referred to as a large deviation case. 

The basic estimate of P„ is defined as follows: generate L i.i.d. samples Xi{l) with underlying density px and 
define 

p(n)(^) :=^X^l£„ (Xr(0) 
1=1 

where 

£n := {(ari,...,a;„) eK" : {u{xi) + .. + u{xn)) € nA} (2) 
with Ui := u {xi) . The Importance Sampling estimator of P„ with sampling density g on R" is 

n^"H^)^=iE^n(oi£.(n"(0) (3) 
1=1 
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where Pn{l) is called "importance factor" and writes 



npx(v;(i)) 



and where the L samples Y{^{1) :— {Yi{l), ...,Yn{l)) are i.i.d. with common density g. 

The problem of finding a good sampling density g has been widely explored when a„ = a is fixed and positive; 
this is the large deviation case; see e.g. [Bucklew 2004j . The case when a tends slowly to E[u (X)] from above (the 
moderate deviation case) is considered in [Ermakov 2007] : 

Under hypotheses to be recalled later, the classical IS scheme consists in the simulation of n i.i.d. replications 
...,Yn^ with density tt"" on K and therefore g{yi, ■■■,yn) — tt"" (yi)...7r'^" (?/„). The density tt"" is the so-called 
tilted (or twisted) density at point a„ which, in case when a„ = a is fixed, is called the dominating point of the set 
{a,oo); see [Bucklew 2004] . In spite of the fact that this terminology is usually used in the large deviation case, we 
adopt it also in the moderate deviation one, for reasons to be stated later on. 

This approach produces efficient IS schemes, in the sense that the computational burden necessary to obtain 
a relative precision of the estimate with respect to P„ does not grow exponentially as a function of n. It can be 
proved that in the large deviation range the variance of the classical IS is proportional to P^\/n. 

The numerator in the expression Q is the product of the pxi (^i) 's while the denominator need not be a density 
of i.i.d. copies evaluated on the Y^'s. Indeed the optimal choice for g is the density of X" := (Xi, X„) conditioned 
upon (X" e £"„), leading to a zero variance estimator. We will propose an IS sampling density which approximates 
this conditional density very sharply on its first components j/i, where fc = fc„ is very large, namely fc/n — > 1. 
This motivates the title of this paper. 

Let us introduce a toy case in order to define the main step of the procedure, namely the simulation of a sample 
under a proxy of the conditional density. Assume X" is a vector of n i.i.d. standard normal real valued random 
variables and P„ := P (Si^„ > no) with Si_„ := Xi + ... + X„ and a > 0. 

1- For any v > a the joint density Pnv of Xi, ...X„_i conditionally upon (Si_„ — nv) is known analytically and 
simulation under p„„ is easy for any v. A general form of this statement is Theorem 1, Section 2. 

2- The optimal sampling density g is similar to Pnv with conditioning event (Si.„ > na) . The density g is 
obtained integrating pnv with respect to the the conditional distribution of Si^„/n under (Si^„ > na) which is well 
approximated by an exponential distribution on {a,oo) with expectation a + (l/na). The corresponding general 
statement is Theorem 2 Section 2. Therefore samples under a proxy of g are obtained through Monte Carlo 
simulation as follows: draw Y" with density p„v where V follows the just cited exponential density. Insert these 
terms in repeatedly to get Pg"^. 



In the general case the joint distribution pnv cannot be approximated sharply on the very long run 1, ...,n — 1, 
but merely on with fc„ close to n. The approximation provided in Theorem [T] and, as a consequence in 

Theorem [2] is valid on the first fc„ coordinates; a precise tuning of /c„ is provided in Section |3] Since v is simulated 
on the whole set {a, +00)^ no search is done in order to identify dominating points and no part of the target set 
(a, +00) is neglected in the simulation of runs; the example in section |6j where the classical IS scheme is compared 
to the present one, is illuminating in this respect. 

The merits of an IS estimator are captured through a number of criterions: 

1 . The asymptotic variance of the estimate 

2. The stability of the Importance Factor 

3. The hit rate of the IS scheme, which is the number of times the set is reached by the simulated samples 

4. Some run time indicator. 

Some mixed index have been proposed (see [Glynn and Whitt 1992| ), combining 1 and 4 with noticeable ex- 
tension. The present paper provides an improvement over classical IS schemes as measured by 1, 2, 3 here-above, 
as shown numerically on some examples. These progresses are also argued on a theoretical basis, following the 
quasi-optimality of the proposed IS scheme resulting from the approximation of the conditional density. When the 
r.v. Ui's are real- valued, the present method might be costly. The toy case which we present in the simulation 
study, pertaining to events (|Ui_„| > na„) under Uj's proves however that the observed bias of the estimate through 
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IS i.i.d. sampling can be important for reasonable L, which does not happen with the present approach. Also the 
hit rate of the present proposal is close to 100%. 

The criterion which we consider is different from the variance, and results as an evaluation of the MSE of our 
estimate on specific subsets of the runs generated by the sampling scheme, which we call typical subsets, namely 
having probability going to 1 under the sampling scheme as n increases. On such sets, the MSE is proved to be 
of very small order with respect to the variance of the classical estimate, which cannot be diminished on any such 
typical subsets. It will be shown that the relative gain in terms of simulation runs necessary to perform an a% 
relative error on P„ drops by a factor \Jn — kj \fn with respect to the classical IS scheme. Since fc is allowed to be 
close to n, the resulting gain in variance is noticeable. Numerical evidence of this reduction in MSE is produced. 
Also we present a way of choosing the value of fc„ with respect to n in such a way that the accuracy of the sampling 
scheme with respect to the optimal one is somehow controlled. This rule is discussed also numerically. 

Alternative methods have been extensively developed for rare event simulation (see [Botev and Kroese 2Q10| 
and references therein). The splitting technique results in an adhoc covering A\ <Z Ai d ... <Z A. It is assumed 
that the conditional distribution of Ui_„ given Ui^„ S nA^ is known. An ad hoc choice of the ^fc's leading to a 
common value for the Pfe's provides efficient estimator for P„, with small run-times. However in the present static 
case the calculation of the conditional distribution is difficult, even in the real case, and requires a sharp asymptotic 
analysis of large or moderate deviation probabilities. 

It may seem that we could have reduced this paper to the case when u is the identity function, hence simulating 
runs := [u (Xi) , (X^)) under (Ui.„ > no) . However it often occurs that the conditioning event is defined 
through a joint set of conditions, say 

u (Xi) + ... + w (X„) > na (5) 

and 

/i(X?)€S„ (6) 

for some function h and some measurable set . Clearly in most cases the approximation of the density of Xj under 
both constraints is intractable and the approximation of the density of Xj^ conditioned upon (X" € En) provides a 
good IS sampling scheme for the estimation of 

P (u (Xi) + ... + M (X„) > na n /i (X'l') e P„) . 

A simple example is when the constraint writes 

X? 

and Dn is included in a set defined through ([s]). The function u and the value of a may be fitted such that ([s]) 
makes minimal the difference 

P [u (Xi) + ... + u (X„) > na) 
-P(X'/ e 

Our proposal therefore hinges on the local approximation of the conditional distribution of longs runs X^"' from 
X". This cannot be achieved through the classical theory of large deviations, nor through the moderate deviations 
one, first developed by |de Acosta 1992j and more recently by }Ermakov 2007| . At the contrary the ad hoc procedure 
developed in the range of large deviations by [Diaconis and Freedman 1988] for the local approximation of the 
conditional distribution of given the value of (Si_„ := Xi + ... + X„) is the starting point of the method leading 
to the present approach. We rely on iBroniatowski and Caron 201 Ij where the basic approximation used in the 
present paper can be found. A first draft in the direction of the present work is in [Broniatowski and Ritov 2009j . 

The present approach can be extended to the case of a multivariate constraint for a multidimensional problem, 
i.e. when for all X in W^, u (x) and a are M'* -valued. This will not be considered here. 

1.2 Notations and Assumptions 

The following notation and assumptions are kept throughout the paper without further reference. 
1.2.1 Conditional densities and their approximations 

Throughout the paper the value of a density pz of some continuous random vector Z at point z may be written 
Pz{z) or p(Zi = z) , which may prove more convenient according to the context. The normal density function on M 
with mean fi and variance r at x is denoted n (/i, r, x) . 
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Let Pnv denote the density of under the local condition (Ui.„ = nv) 

p„„ (Xj = yI') p{Xl = Y,^\ Ui,„ = nz;) (7) 

where belongs to M*^. 

We will also consider the density PnA of X^^ conditioned upon (Ui^„ > na) 

PnA (X^ = Y^) p(Xj = Ui,„ > na). (8) 

The approximating density of p^v is denoted gnv] the corresponding approximation of p^A is denoted QnA- 
Explicit formulas for those densities are presented in the next section. 

1.2.2 Tilted densities and related quantities 

The real valued measurable function u is assumed to be unbounded; standard transformations show that this 
assumption is not restrictive. It is assumed that U = u (X) has a density p\j w.r.t. the Lebesgue measure on M. We 
also assume that the characteristic function of the random variable U is assumed to belong to L*" for some r > 1. 
The r.v. U is supposed to fulfill the Cramer condition: its moment generating function satisfies 

't'lii't) '■= £'cxpiU < oo 

for < in a non void neighborhood of 0. Define the functions m{t),s'^{t) and Hsit) as the first, second and third 
derivatives of log(f>u{t), and denote the reciprocal function of to. 
Denote 

„ exptu 

with m{t) — a and a belongs to the support of Pu, the distribution of U. The density tt^ is the tilted density with 
parameter a. Also it is assumed that this latest definition of t makes sense for all a in the support of U. Conditions 
on (/)u(0 which ensure this fact are referred to as steepness properties, and are exposed in [Barndorff-Nielsen 1978] . 
pl53. 

We also introduce the family of densities 

<(.)-"f^Px(.). (10) 

with 11" the associated distribution. 

1.2.3 Specific sequences 

The sequence a„ is introduced in the paper. For notational convenience its current terms will be denoted a without 
referring to the subscript n. 

2 Conditioned samples 

The starting point is the approximation of pn^ defined in ([t]) on R'^ for large values of k under the point condition 

(Ui,„ = nv) 

when V belongs to (a, oo) . We refer to [Broniatowski and Caron 2011] for this result. 
We introduce a positive sequence e„ which satisfies 



lim e„ Vn — fc = oo (El) 

lim e„ (logn)^ = 0. (E2) 

Define a density gnvivi) on as follows. Set 

5o(2/i|2/o):=<(?/i) (11) 
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with Ho arbitrary and, for I < i < k — 1, define g{yi^i \ y\) recursively. 
Set ti the unique solution of tlie equation 

:= m{ti) = : v (12) 

n — I \ n / 

where := u{yi) + ... + u{y.,). 
Define 

giVi+ilVi) = C,px{y^+l)n{a^ + v,a,u{y^+i)) (13) 
where Ci is a normalizing constant. Here 

s'^{U){n-i-l) (14) 

fc-1 

9nv ivi) 9o{yi\yo)'[[9{yi+i\yl)- (16) 



Set 



Theorem 1 Assume (Ely and (E^ . Then (i) 

Pnv (^1 = n') = + op_(e„ (logn)')) (17) 

and (ii) 

p„„ (X^ = Fi'^) = g„.(Yi'=)(l + OG„„(e„ (logn)')). (18) 

The approximation stated in the above statement (i) holds on typical paths generated under the conditional 
density pns', in the same way, statement (ii) holds under the sampling scheme gns- Therefore they do not hold 
on the entire space M.'' which would require more restrictive hypotheses on the characteristic function of u(Xi); 
see [Diaconis and Freedman 1988j for such conditions in the case when k is allowed to grow slowly with respect 
to n and a is fixed. However the above theorem provides optimal approximations on the entire space M*^ for all 
k between 1 and n — 1 in the gaussian case and u{x) = x, since gns (z/i ) coincides with the conditional density. 
As stated in [Broniatowski and Caron 201 Ij . the extension of our results from typical paths to the whole space 

holds: convergence of the relative error on large sets imply that the total variation distance between the 
conditioned measure and its approximation goes to on the entire space. So our results provide an extension of 
[Diacon is and Freedman 1988] and [Dcmbo and Zeitouni (1996)| who considered the case when k is of small order 
with respect to n; the conditions which are assumed in the present paper are weaker than those assumed in the just 
cited works; however, in contrast with their results, we do not provide explicit rates for the convergence to of the 
total variation distance on M'^. 

As stated above the optimal choice for the sampling density is PnA toi which we state an approximation result, 
extending Theorem [T] 

We state the approximating density for pnA defined in (|8| . It holds 

/•C30 

PnAixi)^ Pnv{'^i^x'l)p{Vi^n/n = v\Vi^n>na)dv (19) 

J a 

SO that, in contrast with the classical IS approach for this problem we do not consider the dominating point 
approach but merely realize a sharp approximation of the integrand at any point of the domain (a, oo) and consider 
the dominating contribution of all those distributions in the evaluation of the conditional density PnA- A similar 
point of view has been considered in jBarbe and Broniatowski 2004j for sharp approximations of Laplace type 
integrals in M.'^. 

The approximation of is handled on some small interval (a, a + c), thus on the principal part of this integral. 
Let Cn denote a positive sequence such that (C) 

lim„_>.oo ncnm^^{a) = oo 
ncn 

sup 7 < oo 

„>i (n - k) 
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and denote c the current term of the sequence c„. 
Denote (A) the following set of conditions 



lim (n — k) {m ^ (a)) 



2 

= oo 



m ^ (a) 
lim = c» 



which trivially holds when lini„_+oo o-n > E\J. 
Define on M*^ the density 



.A(yf) (20) 

_ nm^^ (a) ^^^^ gnvivi) (exp-nm~^ (a) {v - a)) dv 



1 — exp —nm ^ (a) c 

The density 

nm~^ (a) (exp-nm^^ (a) (w - a)) l(a,a+c)(i') 
1 — exp —nm^^ (a) c 



(21) 



which appears in (20) approximates p(Ui^„/n = v\a < Ui^„/n < a + c). 

The variance function V of the distribution of U is defined on the span of U through 

V V{v) := s^(m"^(w)) 

Denote (V) the condition 

sup I \/nm^^{a) I V {v) (exp— nm~^(a) {v — a)) dv j < oo. ((V)) 

n>l \ Ja J 

Theorem 2 Assume (A), (C), ((V)\ l, (eI^ and Q.. Then (i) 

PnA (X^ = Y^) = 5„A(ri'=)(l + op„^ {5n)) (22) 

p„A (Xj = Fi'-O = 5nA(yi'=)(l + 0G„,(<5„)) (23) 

where 

(5„ := max ^e„ (logn)^ , (exp {—ncm~^{a))Y'^ . (24) 

for any positive 5 < 1. 

The proof of Theorem [5] is deferred to the Appendix. 

Remark 3 Most distributions used in statistics satisfy (V); numerous papers have focused on the properties of 
variance functions and classification of distributions, see e.g. fLetac and Mora (1990)^ and references therein. 

Remark 4 When a is fixed, the set of conditions (A) hold. In the case where a = Qn converges to EU, the set of 
conditions (A) does not cover the CLT zone. Indeed, the first condition of (A) implies that m~^(a) satisfies, for 
some S > 0, 

|TO-i(a)ni/2+'5| < oo. 



Besides this limitation, choosing k and e„ according to (A), (C), {El) and (E2) is always possible. More a„ 
convergences slowly to E\J , more k can be choosen large with respect to n. 
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3 How far is the approximation valid? 



This section provides a rule leading to an effective choice of the crucial parameter k — kn in order to achieve a given 
accuracy bound for the relative error committed substituting pnA by gnA- The largest k the best the estimate of 
the rare event probability. We consider the large deviation case, assuming a fixed. 
The accuracy of the approximation is measured through 

EREik) Ea^^ {Y^) ^"^ (^i") " ^"-^ (^i^) (25) 

\^ PnA y^i ) I 

and 

VRE[k) Varo^^ f 1,, {Y^) (^i^) ' (^i') ^ (26) 

V vnA {y^) ) 

respectively the expectation and the variance of the relative error of the approximating scheme when evaluated on 

Bk := {y\ e R'' such that |gui,„ (2;?)/Pui,„ (vi) - l| < 5n} 

with e„ (logn)'^ — ?> and (5„ -> 0; therefore G^i „ (Dk) 1. The r.v's Yj*^ are sampled under gnA- Note that 
the density pnA is usually unknown. The argument is somehow heuristic and informal; nevertheless the rule is 
simple to implement and provides good results. We assume that the set Dk can be substituted by M'^ in the above 
formulas, therefore assuming that the relative error has bounded variance, which would require quite a lot of work 
to be proved under appropriate conditions, but which seems to hold, at least in all cases considered by the authors. 
We keep the above notation omitting therefore any reference to - 

Consider a two-sigma confidence bound for the relative accuracy for a given fc, defining 

CI{k) := 

ERE{k) - 2yJVRE{k),ERE{k) + 2^VRE{k) . 



Let 5 denote an acceptance level for the relative accuracy. Accept k until 6 belongs to CI{k). For such k the 
relative accuracy is certified up to the level 5% roughly. 

In [Broniatowski and Caron 2011| , a similar question is addressed and a proxy of the curve (5 — > fc^ is provided in 
order to define the maximal k leading to a given relative accuracy under the point condition (Ui^„ = no) , namely 
when PnA is replaced by Pna and gnA by gna- 

Consider the ratio gnA{Yi)/PnA (Yi) and use Cauchy's mean value theorem to obtain 

gnA{Y,')/pnA (^l') 



/° gnvQ^i ^ Y^) (exp -nm ^ (a) (v - a)) dv 
XT Pnv (Xj = Y-l') (exp -nm-i (a) (v - a)) ds 



(1 + 0G„ 
dnaO^l = Yi) 



Pn 



{XI 



.(1)) 

(l + OG„,(l)) 



for some a between a and a + c. Since a and c are fixed, eventually small, it is reasonable to substitute a by 
o in order to evaluate the accuracy of the approximation. We thus inherit of the relative efficiency curve in 
[Broniatowski and Caron 2011| . to which we refer for definition and derivation. 

We briefly state the necessary steps required for the calculation of the graph of a proxy of fc — > CI{k). 

Introduce 

"7r^(a)" 



and 



D 



N := 



Pv{a) 



Pv (nik) 



n (n-k) 



7 



with nik defined in (12). Define t by m{t) — a and t'' by m{t^) — mk- Define 



A{Y^) 



n-k gnA{Y^)\ (N\ s-'it^) 



n \P^{Y^) J \Dj s^t) ■ 
Simulate L i.i.d. samples Y^{1) , each one made of k i.i.d. replications under px! set 

L 

L 



(27) 



1=1 



We use the same approximation for B. Define 



B {Y,^) := 



k 9nA{Y^)\ (N\ s^t^) 



and 



with the same Yi{l)'s as above. 
Set 



(28) 



1=1 



VRE{k) -.^ A- B^ 



which is a fair approximation of VRE{k). 

In the same way a proxy for ERE is defined through 



A proxy of CI{k) can now be defined through 



ERE{k) 1 - B. 



CI{k) 



ERElk) - 2JVRE{k),ERE{k) + 2JVRE{k) 



(29) 
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Wc now check the validity of the just above approximation, comparing CI{k) with CI{k) on a toy case. Detailed 
algorithms leading to effective procedures are exposed in the next section. 

Consider u{x) — x. The case when px is a centered exponential distribution with variance 1 allows for an explicit 
evaluation of Cl(k) making no use of Lemma 11 The conditional density p^v is calculated analytically, the density 
is obtained through (161, hence providing a benchmark for our proposal. The terms A and B are obtained by 
Monte Carlo simulation following the algorithm presented hereunder. Tables 1,2 and 3,4 show the increase in 8 
w.r.t. k in the large deviation range, with a such that P„ := P (Si_„ > no) ~ 10~*. We have considered two cases, 
when n — 100 and when n = 1000. These tables show that the approximation scheme is quite accurate, since the 
relative error is fairly small even in very high dimension spaces. Also they show that ERE et CI provide good 
tools for the assessing the value of k. Denote P„ := P (Si_„ > no) . 



4 The new Estimator and the algorithms 
4.1 Adaptive IS Estimator for rare event probability 

The IS scheme produces samples Y :— (Yi, Yfc) distributed under gnA, which is a continuous mixture of densities 
Qnv as in (16), with exponential mixing measure with parameter nm~^ (a) on (a,oo) 



1(„ oo-)(u)nm (o) exp [— nm (a) (u — a)] 



(30) 



Since all IS schemes produce unbiased estimators, and since the truncation parameter c in (20) is immaterial, 
we consider untruncated versions of gnA defined in (20) integrating on (a, oo)instead of (a,a + c) . This avoids a 
number of computational and programming questions, a difficult choice of an extra parameter c, and does not 
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Figure 1: ERE{k){so\id line) along with upper and lower bound of C/(A;)(dotted line) as a function of k with 
n = 100 and a such that P„ ~ 10"^. 




Figure 2: ERE{k){so\id line) along with upper and lower bound of C/(A;) (dotted line) as a function of k with 
n = 100 and a such that P„ ~ 10~^. 



200 400 



800 1000 



Figure 3: ERE{k){so\id line) along with upper and lower bound of C/(A:) (dotted line) as a function of k with 
n = 1000 and a such that P„ ~ lO'^. 



change the numerical results; this point has been checked carefully by the authors. Wee keep the notation QnA for 
the untruncated density. 

The density QnA is extended from R*' onto R" completing the n — k remaining coordinates with i.i.d. copies of 
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Figure 4: ERE{k){so\id line) along with upper and lower bound of C/(fc) (dotted line) as a function of k with 
n = 1000 and a such that P„ ~ 10"*. 

r.v's Yfe+i, ■■■,Yn with common tilted density 

n 

5nA(y^+i|2/^) n '^""(y.) (31) 

i=k+l 



with TOfe := m(t'') = ^ (w - ^^) as in (12 I and 

k 



i=l 



The last n— A; r.v's Y^'s are therefore drawn according to the classical i.i.d. scheme in phase with [Sadowsky and Bucklew 1990| 
or [Ermakov 2007] schemes in the large or moderate deviation setting. 
We now define our IS estimator of P„ :— P (Ui_„ > no) . 
Let Y"{1) := Yi{l), ...,Yn{l) be generated under gnA- Let 

and define 

^»^=ZE^"(0- (33) 

in accordance with (|3|. 
4.2 Algorithms 

First, we present a series of three algorithms (Algorithms [l] [2] and [s]) which produces the curve k — RE{k). The 
resulting A; = fc^ is the longest size of the runs which makes gnA a good proxy for PnA- 

The calculation of gnv (2/1 ) above requires the value of 

= I I py.{x)n{al3 + v,l3,u{x))dx 



This can be done through Monte Carlo simulation. The value of M need not be very large. 

Remark 5 Solving ti — m^^irrii) might he difficult. It may happen that the reciprocal function of m is at hand, 
but even when px is the Weibull density and u{x) — x, such is not the case. We can replace step * by 

(to {ti)+Ui) 
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yf , Px, n, V 

9nv (Vl) 



Input 
Output 

Initialization: 

to ^ (w); 
go(a;i|x?) ^ <(a;i); 
wia w(yi); 
Procedure 

for i -i^ 1 to A; 1 do 



{mi) *; 



a ^(14); 
Calculate Q 



(131 



+ u(y,+i); 



end 



Compute gnvivi) -^(116 1 
Return : gnviVi) 



Algorithm 1: Evaluation of gnviui , 



Input : px, fc, a, M 

Output : QnA iVi) 

Procedure 

for m ^ 1 to M do 

Simulate Vm with density ( 30 ) ; 
Calculate gnv„ (ui) with Algorithm 1; 
Calculate 5„„„ {Vk+ilyi) ^ |3l|; 
Calculate g^vm iVi) ^ 9nv,„ (yf) gnv^ {yk+ilVi) 
end 

Compute gnA iVi) ^ jj Em=i 5«i.,„ iVi); 
Return : g^A iVi) 

Algorithm 2: Evaluation of gnA [Vi ) 
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Indeed since ^ 

m(ti+i) - m{ti) = : {m{U) + Ui) 

n — I 

with Ui := u {Ui), use a first order approximation to derive that ti^i can he suhtituted by t^+i defined through 

In the moderate deviation scale the function s^(.) does not vary from 1 and the above approximation is fair. For 
the large deviation case, the same argument applies, since s'^{ti) keeps close to s'^(t"'). 



Input : pxi ^, TT-i O'l ^ 

Output : kg 

Initialization: k — \ 
Procedure 

while 5 i CI{k) do 
for / ^ 1 to L do 

Simulate Yi{l) i.i.d. with density px', 
A (Yi'=(0) :=([27| using Algorithm 2 ; 
B {rfil)) :=(28) using Algorithm 2 ; 
end 



Calculate CI{k) <^([29j) 
k := k + 1; 
end 

Return : ks := k 



Algorithm 3: Calculation of ks 



The next algorithms |4] [5] and |6] provide the estimate of P„. 

The following algorithm provides a simple acceptance/rejection simulation tool for Y^+i with density g{yi^i \ y\). 
Denote 91 the c.d.f. of a normal variate with parameter cr^) ,and its inverse. 



p, p., cr' 
Y 



Input 
Output 
Initialization: 

Select a density / on [0, 1] and 
a positive constant K such that 
p (Vl-i(a;)) < Kf{x) for all x in [0, 1] 
Procedure 

while Z < p{m-^{X)) do 
Simulate X with density /; 
Simulate U uniform on [0, 1] independent of X; 
Compute Z := KUf{X); 
end 

Return -.Y-.^m-^iX) 



Algorithm 4: Simulation of Y with density proportional to p{x)n (/i,cr^,: 



Remark 6 The paper Warbe and Broniatowski 19991 can he used in order to simulate Yi. 
Remark 7 tt'^' is defined as in (31) 



as in ( 12) and 



n ( uxy 

OLi := vi 

n — k \ n . 



ux,k = Y,<^^^^))- 

i=i 
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Input : px, S, n, v 

Output : Y]*^ 

Initialization: 

Set k <~ kg with Algorithm 3; 
to ^ m~i(u); 
Procedure 

Simulate Yi with density tt^; 

^ u(Yi); 
for i 1 to fc — 1 do 



m, -^(|12|) 



a ^(141 



("T-j); 



Simulate li+i with density g{yi^i \ y\) using Algorithm 4; 



end 



Return 



Algorithm 5: Simulation of a sample with density g„ 



Input 

Output 

Initialization 

Set k 
Procedure 
for / 



6, n, a, M, L 

P.. 



kg with Algorithm 3; 



1 to L do 



Simulate vi with density ( 30 1 ; 

Simulate Y^[l) with density gnvi with Algorithm 5; 
Simulate Y^j^-y{l) i.i.d. with density tt"'; 
Calculate g^A (^i"(0) with Algorithm 2; 
Calculate ^(0 ^ (lii); 



end 



Return 



Compute Pn -s— (33); 
: K 



Algorithm 6: Calculation of P„ 
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5 Compared efficiencies of IS estimators 



The situation which we face with our proposal lacks the possibility to provide an order of magnitude of the variance 
our our IS estimate, since the properties necessary to define it have been obtained only on typical paths under the 
sampling density gnA and not on the whole space . This leads to a quasi-MSE measure for the performance 
of the proposed estimator, which quantifies the variability evaluated on classes of subsets of M" whose probability 
goes to 1 under the sampling gnA- Not surprisingly the loss of performance with respect to the optimal sampling 
density is due to the n — k last i.i.d. simulations, leading a quasi- MSE of the estimate proportional to \Jn — k. 



5.1 The efficiency of the classical IS scheme 

We first recall the definition of the classical IS sampling scheme and its asymptotic performance. The r.v.'s l^'s in 
Q are i.i.d. and have density g = tt" , hence with m(i) = a. See [Sadowsky and Bucklew 19901 in the LDP case 
and [Ermakov 2007| in the MDP case. The reason for this sampling scheme is the fact that in the large deviation 
case, a is the "dominating point" of the set (a,oo) i.e. a is such that the proxy of the conditional distribution of 
Xi given (Ui_„ > no) is 11"; this is the basic form of the Gibbs conditioning principle. 

Although developed for the large deviation case, the classical IS applies for the moderate deviation case since 
for a — > E\a (X)] and (a — E\u (X)]) ^/n — > oo it holds 

P (Xi e B| Ui,„ > no) = (1 + o(l)) nS(B) (35) 

for any Borel set B as n — >■ oo. This follows as a consequence of Sanov Theorem for moderate deviations (see 
[Ermakov 2007] and |de Acosta 1992| ) and justifies the classical IS scheme in this range. 

The classical IS is defined simulating L times a random sample of n i.i.d. r.v's Y^{1\ 1 < I < L, with tilted 
density 7r°. The standard IS estimate is defined through 

W - ^ Vl. ry»m^ nr=iPx(y»(0) 
where the Xi{l) are i.i.d. with density ttJJ and l£,^(Y]"(/)) is as in ^. Set 

The variance of P„ is given by 

The relative accuracy of the estimate P„ is defined through 
The following result holds. 

Proposition 8 The relative accuracy of the estimate P„ is given by 

RE{K) = ^^"" 0(1 + 0(1)) 

as n tends to infinity. 

We will prove that no reduction of the variance of the estimator can be achieved on subsets P„ of M" such that 

The easy case when Ui,...,U„ are i.i.d. with standard normal distribution and u{x) = a; is sufficient for our 
need. 

The variance of the IS estimate of P (Ui^„ > no) is proportional to 



VarPn = ^{Eua {PJDY -Pi 



Ep^jt(a,oo) ( ~~) (^^P~^ ) (exp-aUi,„) - P^ 
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A set Bn resulting as reducing the MSE should penalize large values of — (Ui + ... + U„) while bearing nearly all 
the realizations of Ui + ... + U„ under the i.i.d. sampling scheme tt^ as n tends to infinity. It should therefore be 
of the form (6, oo) for some b = bn so that 



with a clear gain over the variance indicator. However when b < a, (b) does not hold and, when b > a, (a) does not 
hold. 

So no reduction of this variance can be obtained by taking into account the properties of the typical paths 
generated under the sampling density: a reduction of the variance is possible only by conditioning on "small" 
subsets of the sample paths space. On no classes of subsets of M" with probability going to 1 under the sampling is 
it possible to reduce the variability of the estimate, whose rate is definitely proportional to y'n, imposing a burden 
of order L^/na in order to achieve a relative efficiency of a% with respect to P„. 

5.2 Efficiency of the adaptive twisted scheme 

We first put forwards a Lemma which assesses that large sets under the sampling distribution gnA bear all what is 
needed to achieve a dramatic improvement of the relative efficiency of the IS procedure. Its proof is deferred to the 
Appendix. 

Lemma 9 Assume k/n ^ 1. It then holds, 
1. There exist sets Cn in M" such that 

• hm„^oo GnA (Cn) = 1 



(a) 




and 



(b) 





• when a — >■ E\J (moderate deviation), 



t'^sit'') 



« (1 + 0(1)) 



(36) 



• when lim. 



■oo 



On is larger than EXJ (large deviation) , t^s{t^) remains bounded away from and infinity. 



We now evaluate the Mean Square Error of the adaptive twisted IS algorithm on this family of sets. Let 




We prove that 



Proposition 10 The relative accuracy of the estimate Pn is given by 



RE{Pn) 



\/^2/K\Jn — fc — 1 



a(l + o(l)) 



L 



as n tends to infinity. 
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Proof. Using the definition of C„ we get 



Eg,.a ( 



9nA{Y,^)9nA{Y^\,\Yl^) 



<P„(l + <5„)i;p,.^lc„(Yi") 
= P^{l + S^)Ep,^,lcAYn 



P^{Y^) P^{Y;+i) 
p{Y,k\£n)gnA{Y^+,\Y,^) 

1 Px(rfc"+i) 

p(fn|n')5nA(>T+l|n') 



Ep^,tcAYnt''s{t'^){l + o{l)) 
= P^aV2nVn - fc - 1(1 + 0(1)). 
The third line is Bayes formula. The fourth line is Lemma 
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(see the Appendix). The fifth line uses (36) and 
uniformity in Lemma |11[ where the conditions in Corollary 6.1.4 of [Jensen 1995| are easily checked since, in his 
notation, J{9) = M , condition (i) holds for 61 in a neighborhood of (Gq indeed is restricted to such a set in our 
case), (ii) clearly holds and (iii) is a consequence of the assumption on the characteristic function of u (Xi) . ■ 



6 Simulation results 
6.1 The gaussian case 

The random variables Xls are i.i.d. with normal distribution with mean and variance 1. The case treated here is 
P„ = P > aj = 0.009972 with n = 100, and a = 0.232. We build the curve of the estimate of P„ (solid fines) 

and the two sigma confidence interval (dot lines) with respect to k. The value of L is L = 2000. 




Figure 5: Curve of P„ (solid line) in the normal case along with the two sigma confidence interval (dotted lines) as 
function of k with n — 100 for L ~ 2000 instances. 



6.2 The exponential case 

The random variables X'^s are i.i.d. with exponential distribution with parameter 1 on (— l,cxi) . The case treated 
here is P„ = P (^^^ > aj = 0.013887 with n = 100, and a = 0.232. The sofid fines is the estimate of P„, the dot 
lines are the two sigma confidence interval. Abscissa is k. 
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Figure 6: Curve of P„ (solid line) in the exponential case along with the two sigma confidence interval (dotted lines) 
as function of k with n = 100 for L = 2000 instances. 



Figure [7] shows the ratio of the empirical value of the MSE of the adaptive estimate w.r.t. the empirical MSE 
of the i.i.d. twisted one, in the exponential case with P„ — 10^^ and n = 100. The value of k is growing from 
k = (i.i.d. twisted sample) to A: = 70 (according to the rule of section [s]). This ratio stabilizes to \Jn — k/ ^/n for 
L = 2000. The abscissa is k and the solid line is fc — )■ ^/n — kj ^Jn. 




Figure 7: Ratio of the empirical value of the MSE of the adaptive estimate w.r.t. the empirical MSE of the i.i.d. 
twisted one (dotted line) along with the true value of this ratio (solid line) as a function of k. 



6.3 A comparison study with the classical twisted IS scheme 

This section compares the performance of the present approach with respect to the standard tilted one as described 
in Section [T] 

Consider a random sample Xi, ...,Xioo where has a normal distribution A^(0.05, 1) and let 

f 100 -.^ { : + > 0.28 



for which 



Pioo = P ((^1, ^loo) G = 0.01120. 
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Our interest is to show that in this simple dissymetric case a direct extension of our proposal provides a good 
estimate, while the standard IS scheme ignores a part of the event i^ioo- The standard i.i.d. IS scheme introduces 
the dominating point a = 0.28 and the family of i.i.d. tilted r.v's with common N{a^ 1) distribution. The resulting 
estimator of Pioo is 0,01074 (with L = 1000), indicating that the event 6*1^100/100 < —0.28 is ignored in the 
evaluation of Piooi inducing a bias in the estimation. Since the simulated r.v's are independent under the tilted 
distribution the Importance factor oscillates wildly. Also the hit rate is of order 50%. It can also be seen that 
S'i™/100 < —0.28 is never visited through the procedure. 

This example is not as artificial as it may seem; indeed it leads to a two dominating points situation which is 
quite often met in real life. Exploring at random the set of interest under the distribution of {xi + ... + xioo) /lOO 
under £100 avoids any search for dominating points. A further paper in explores the advantage of this method, 
which already proves to compare favorably with usual methods on M. 

Drawing L i.i.d. points wi, according to the distribution of ^i^ioo/lOO conditionally upon |S'i,ioo| /lOO > 

0.28 we evaluate Pioo with k — 99; note that in the gaussian case Theorem [l] provides an exact description of the 
conditional density of for all k between 1 and n, and therefore the same nearly holds in Theorem [2j Simulating 
the ViS. in this toy case is easy; just simulate samples Xi,...,Xioo under Af(0.05, 1) until £100 is reached. The 
resulting value of the estimate is 0.01125 which is fairly close to Pioo- 

As expected the Importance factor is very close to Pioo for £^11 sample paths X" simulated under GnA', this is in 
accordance with Theorem [l] Also the hit rate is very close to 100%. 

The histograms pertaining to the Importance factor are as follows (Figures 12 and 13). 




0.00 0.02 0.04 0.06 0.08 0.10 0.12 



Figure 8: Histogram of Importance Factor with k — 1 and n = 100 for L — 1000 instances. 

It is also interesting to draw the hit rate as a function of k. When fc = 1 then this rate is close to 50%, since the 
present algorithms coincides with the classical i.i.d. IS scheme. As k increases, the hit rate approaches 100%; the 
value of L is 1000. 



7 Appendix 

The following lemma provide asymptotic formula for the tail probability of Ui_„ under the hypothesis and notations 
of section 3. Define 

I\j{x) := xm~^ (x) — log0u [m~^ (x)) 
Lemma 11 (see I Jensen 1995f . Corollary 6.4-1) Under the same hypotheses and notations as section 3, 

V " / v27rv^V(a) V v"' 

where ■— rn^^{a)s{m^^{a)). 
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Figure 9: Histogram of Importance Factor with fc = 99 and n = 100 for L = 1000 instances. 




Figure 10: Curve of the hit Rate as a function of k with n = 100 for L = 1000 instances. 
7.1 Proof of Theorem [2] 

7.1.1 Two Lemmas pertaining to the partial sum under its final value 

Lemma 12 Suppose that (V) holds. Then (ijEp^^XJi = a + o(l), (ii) Ep^^Uf — + s'^ (m^-^(a)) +o(l) and (in) 
Ep^^\JilJ2^a^ +o{l). 

Proof. We make use of Lemma 23 of [Broniatowski and Caron 2011] . meaning i?p,^^ [Ui] = v. It holds 

/•OO 

£^p„aUi = / {Ep^^lJi) p {\Ji,n/n ^ v\Ui .n > na)dv. 

J a 

Integration by parts yields, 

rOO 

Ep^^^Vi = a+ / P{Ui.n/n > f | Ui^^ > na)dv. 

J a 

Using Lemma [TT] and Chernoff inequality, 

poo nOC 

/ P (Ui^^i/n > v\ XJi,n > na) dv < V27Til^{a)y/n / exp[n {Iu{a) — Iu{v))] 

J a J a 
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where ip{a) is defined in Lemma 11 



Finally, using Iv{v) > Ifj{a)v + Iu{a) — aljj{a), and integrating 

/27rs(TO-i(a)) 



poo 

/ P {lJi,n/n > v\lJi^n > na)dv < 

J a 



Hence, Ep^^^Vi = a + o(l). 

Insert £^p„„U? = + (m-i(a)) + O (i) in 

nOO 

^p„aU? = / £;p„„U?p(Ui,„/n = w|Ui,„ > na)c;w. 

^ a 

Firstly, by integration by parts. Lemma [TT] and Chernoff inequality, 

v^p (Ui,„/n = v\ XJi^n > no) dv — a? + o(l) 



Indeed, since (C) implies nm ^(a) — )■ oo when n tends to oo, it holds 

s(TO~^(a)) 



/"OO 

/ vp{'\Jin/'n>v\\Ji^n>na)dv< 

J a 



nm 1(a) 



Secondly, 

/•oo 

( Ui.„/ri = u| Ui_„ > na) dv 



s^(m"\a)) + 2 / V' {v)P{\5i,nln> v\Vi^n> na)dv. 

J a 

Using Lemma 11 Chernoff inequality and I\j{v) > Iu{a)v + /u(a) — '^Itj{o-)i it holds under condition (V), 

V' {v)P {Vi^n/ii > v\ Ui,„ > na) dv 
< s{m~^{a)) ^\/nm~^{a) J V (v) cxp (^—nm~^{a){v ~ a)^ dv^ 

and 

/•oo 

V{v)p{JJi,n/n = v\ Ui,„ > na) dv = s'^{m^^{a)) + o(l). 



The third term is handled similarly due to the fact that the 0(l/n) consists in a sum of powers of v. 
For i?p,^^UiU2 = + o(l), the proof is similar. ■ 

Lemma 12 yields the maximal inequality stated in Lemma 22 of [Broniatowski and Caron 2011) under the 
condition (Ui^„ > na) . We also need the order of magnitude of the maximum of (|Ui| , |Uj;|) under P„^ which 
is stated in the following result. 

Lemma 13 It holds for all k between 1 and n 

max(|Ui|,...,|Ufc|) = Op„^(logn). 

Proof. Using the same argument as in Lemma 23 of [Broniatowski and Caron 2011| . we consider the case when 
the r.v's XJi take non negative values. We prove that 

lim F„A(max(Ui,...,Ufc) > t„) = 

n— >-C30 

when 

T . in 

iim = OO. 

n->-oo logn 
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It holds 



P„A (max(Ui, ...,Ufc) > t„) = 



na+c 

/ Pnv (niax(Ui, ...,Ufe) > i„| Ui,„/7i v) 

J a 

P ( Ui^„/n = w| Ui_„ > na) dv 

+ / ^ni, (max(Ui, ...,Ufc) > i„| Ui^„/7i = u) 

J a+c 

p ( Ui „/n = w| Ui_„ > na) dv 
-.1 + 11. 



Now, using the same arguments as before, 

P (Ui,„/n > a + c) 



// < 



< 



P (Ui_„/n > a) m-i(a + c)s{m-'^{a + c)) 



exp [~ncm 



Since c is fixed and m ^(a) is bounded ,//—>■ under (C). 

Furthermore by Lemma 23 of [Broniatowski and Caron 2011) . 

lim P (max (Ui, U„) > t„| Ui.„/n = i;) =: hm r„ = 
when we (a, a + c) . Hence 



/<r„(l + o(l))^0. 



This proves the Lemma. ■ 
We now prove Theorem [SJi). 



Step l.We first prove that the integral (19) can be reduced to its principal part, namely that 

p„^(r/-) = (i + op„,(i)) 



holds for any fixed c > 0. 

Apply Bayes formula to obtain 



/ p{X'l = Y^\Vi^„/n = v)p{lJi,a/n = v\\Ji^n > na)dz 

J a 



{n - k) 



P (Ui,„ > na) 



where Ui^k := % 
Denote 



with 



7-> / '^k + l.n ^ ___ \ 



mfe 



n — k 



kUi.k 



Then (371 holds whenever / — > (under Pua)- 
Under PnA it holds 



1 



^nm ^{a) ^ 

A similar result as Lemma 22 holds under condition (Ui.„ > na), using Lemma 21; namely it holds 



Using both results, it holds 



max J7i+i „ = a + op„^ (e„) . 



TOfc = a + Op^^ (w„) 



21 



with Vn — max ^e„, ^ which tends to under (C). 

We now prove that / — > 0. Using once more Lemma [Tl] yields 

m"^(TOfc)s(m^^(mfe)) 



/ < 



(n - k) [ lu [ TOfe + 



exp 

Now by convexity of the function lu, and (|38| 
exp -{n- k) i lu i nik 



-k 



-k 



- IjJ {nik) 



< exp— nan ^ {mk) — exp —nc 



m ^(a) + 



1 



V{a + eOp^^{v^)) 



Op^AVn) 



for some 6 in (0, 1) . which tends to under PnA when (A) and (C) hold. By monotonicity oi t m{t) and 
condition (C) the ratio in / is bounded. 
We have proved that 

I ~ Op^^ (exp— ncTO~^(a)) . 

Step 2. Theorem ^{i) holds uniformly in v in (a, a + c) where is generated under PnA- This result follows 
from a similar argument as used in Theorem [l] where ( |22[ ) is proved under the local sampling P„„ . A close look 
at the proof shows that (22) holds whenever Lemmas 22 and 23, stated in [Broniatowski and Caron 2011] for the 



variables U^'s instead of X^'s hold under PnA- Those lemmas are substituted by Lemmas 12 and 13 here above. 



Inserting ( 22 ) in ( 37 1 yields 



f-a+c 

PnA{Yi) = ( / gnv{Y^)p{Vi^n/n = v\ Ui ,„ > na)dv 



Op-aA (max {tn (logn)^ , (exp {-ncm ^(a)))''))) 



dor any positive 5 < \. 

The conditional density of Ui „/n given (Ui „ > na) is given in (301 which holds uniformly in v on (a, a + c) 



Summing up we have proved 



PnA{Y^) = 

(^ra~^ {a) j gnvi^i) {exp —nm~^ {a) {v — a)) dv^ 
1 + Op^^ (max (e„ (logn)^ , (exp {—ncm"'^ {a))) 



as n — ^ oo for any positive 6. 

In order to get the approximation of pnA by the density gnA it is enough to observe that 



nm ^ {a) / gnv{Yi) {exp —nm ^ {a) (v — a)) dv 

J a 

= 1 + Oj, ^ {exp —ncm^^ {a)) 



as n -> cx) which completes the proof of (22 1. The proof of (23) follows from (22 1 and Lemma 14 cited hereunder 



The following Lemma proves that approximating pnA by gnA under pnA is similar to approximating pnA by gnA 
under 5„^. 

Let 9l„ and ©„ denote two p.m's on M" with respective densities r„ and s„. 
Lemma 14 Suppose that for some sequence £„ which tends to as n tends to infinity 

Vn{Yn=Sn (ri")(l+05,„(e„)) 

as n tends to oo. Then 

Sn (Fi") =r„(yi")(l + oe„ (£„)). 



22 



Proof. Denote 

It holds for all positive 6 
where 

Since 

it follows that 

which proves the claim. ■ 
7.2 Proof of Lemma |9] 

Assume k/n ^ 1. Let C„ in M" such that for all in C„, 



: (1 - £„)s„ (2/1 ) < r„ (y^ < s„ (y^ (1 + £„)} . 
lim /(n, (5) = 1 



I{n,6) < (1 + fe„)©„ (A„,5eJ 



lim 6„ {An.SeJ = 1 = 



- 1 



with Sn as in ( 24 ) and 



5«A (yf ) 



< Otn 



where is defined through 

with ui^k '■— u{yi) ^-iid an satisfies 

together with 
We prove that 

Let 
with 



n — k \ n 



lim a„ = 

n— f oo 



lim Q!„o-\/ n — k = oo. 

n— ^oo 

lim GnA (C„) = I. 



nn—k 



PnA{Xi) 



- 1 



By the above definition 
Note also that 



9nA (xi) 
lim PnA (^«,s„) = 1- 

n—>-oo 

GnA {An^eJ-^ J (x^) dx^ 

= J 1^.^ (a:f).9„A (2;^) da;" 

- J ^A!,Jx'[)pnA{x'l)dx^i 



1 

1 + 5, 
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which goes to 1 as n tends to oo. We have just proved that the sequence of sets A„_£^ contains roughly all the 
sample paths X" under the importance sampling density gnA- 
We use the fact that t*^ defined through 



is close to a under pnv uniformly upon v in (a, a + c) and integrate out with respect to the distribution of Ui_„/n 
conditionally on Ui_„/n G (a, a + c) . 

Let 6n tend to and lim„_yoo ciUnV n — k = oo and 



Bn : 



m(t'=) 



1 



We prove that on B„ 



t'^sif) = a(l + o(l)) 



(42) 



holds. 



By Lemma 22 in [Broniatowski and Caron 201 1| and integrating w.r.t. p„„ on (a, a + c) it holds, under ([39]) and 



(40) 



lim EaA (Bn) = 1. 



(43) 



There exists 5^ such that for any in B„ 



Indeed 



and lim„_j.oo Vk = 0. Therefore 



<5L. 



Vkt^ Vkt^ 

1 a„ < — < 1 h ar, 

a a a 



Since is bounded so is — and therefore 



as n — > cx) which implies ( 44 ) 



Further (44) implies that there exists such that 



- 1 



Indeed 



- 1 



where lim„_j.oo Uk — 0. Therefore (42) holds. 
Define 



Cn ■ — Bfi n Afi^er, 



Since 



and by (41) and (43) 



we obtain 



y lc„«)5„A (xj) dx'/ > llcMi^i)dx'l 
lim P„A (C„) = 1 

n— J-oo 

lim GnA (C„) = 1. 



(44) 



which concludes the proof of (i) and (ii) . 
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